In recent times, the field of agriculture has been in urgent need of modernizing, since the amount of manual work people need to put in to check if plants are growing correctly is still highly extensive. Despite several advances in agricultural technology, people working in the agricultural industry still need to have the ability to sort and recognize different plants and weeds, which takes a lot of time and effort in the long term. The potential is ripe for this trillion-dollar industry to be greatly impacted by technological innovations that cut down on the requirement for manual labor, and this is where Artificial Intelligence can actually benefit the workers in this field, as the time and energy required to identify plant seedlings will be greatly shortened by the use of AI and Deep Learning. The ability to do so far more efficiently and even more effectively than experienced manual labor, could lead to better crop yields, the freeing up of human inolvement for higher-order agricultural decision making, and in the long term will result in more sustainable environmental practices in agriculture as well.
The aim of this project is to Build a Convolutional Neural Network (CNN) to classify plant seedlings into their respective categories.
The Aarhus University Signal Processing group, in collaboration with the University of Southern Denmark, has recently released a dataset containing images of unique plants belonging to 12 different species.
Due to the large volume of data, the images were converted to the images.npy file and the labels are also put into Labels.csv, so that you can work on the data/project seamlessly without having to worry about the high data volume.
The goal of the project is to create a classifier capable of determining a plant's species from an image.
List of Species (12)
# Installing the libraries with the specified version.
# uncomment and run the following line if Google Colab is being used
# !pip install tensorflow==2.15.0 scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==1.5.3 opencv-python==4.8.0.76 -q --user
# Installing the libraries with the specified version.
# uncomment and run the following lines if Jupyter Notebook is being used
#!pip install tensorflow==2.13.0 scikit-learn==1.2.2 seaborn==0.11.1 matplotlib==3.3.4 numpy==1.24.3 pandas==1.5.2 opencv-python==4.8.0.76 -q --user
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import math
from PIL import Image
import cv2
from sklearn.model_selection import train_test_split # Function for splitting datasets for training and testing.
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, classification_report
import tensorflow as tf
# Keras Sequential Model
from tensorflow.keras.models import Sequential
# Importing all the different layers and optimizers
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization, Activation, LeakyReLU, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam,SGD
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.models import Model
# For Transfer Learning
from tensorflow.keras.applications import InceptionV3
# For Data Augmentation
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# The below code can be used to ignore the warnings that may occur due to deprecations
import warnings
warnings.filterwarnings("ignore")
# Uncomment and run the below code if you are using google colab
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
# Load the Labels CSV file into a NumPy array
labels = np.loadtxt('/content/drive/MyDrive/Personal/UT Austin/Project 5 - CV/Labels.csv', delimiter=',', dtype=str, skiprows=1)
# Load the images.npy file
images = np.load('/content/drive/MyDrive/Personal/UT Austin/Project 5 - CV/images.npy')
images.shape
(4750, 128, 128, 3)
We have 4,570 images of 128x128 pixels and in color (3 channels)
labels.shape
(4750,)
# Get the count of unique values
num_unique_labels = len(np.unique(labels))
print(f"Number of unique species: {num_unique_labels}")
Number of unique species: 12
# Get unique values and their counts
unique_labels, counts = np.unique(labels, return_counts=True)
# Display the unique values with their counts
for label, count in zip(unique_labels, counts):
print(f"{label}: {count}")
Black-grass: 263 Charlock: 390 Cleavers: 287 Common Chickweed: 611 Common wheat: 221 Fat Hen: 475 Loose Silky-bent: 654 Maize: 221 Scentless Mayweed: 516 Shepherds Purse: 231 Small-flowered Cranesbill: 496 Sugar beet: 385
And here we have the corresponding labels for the 4,750 images, the different label values and counts per value. As we can see, there is some imbalance in the number of images per label value.
def display_random_images(image_array, labels, rows=4, cols=6):
"""
Display a grid of random images with their corresponding labels.
Parameters:
- image_array: The array containing images (shape: (num_images, height, width, channels)).
- labels: Array of labels corresponding to the images.
- rows: Number of rows in the plot grid (default is 4).
- cols: Number of columns in the plot grid (default is 6).
"""
# Define the number of unique classes (optional, you can use it elsewhere if needed)
num_classes = len(np.unique(labels))
# Create a figure with the specified size
fig = plt.figure(figsize=(15, 12))
# Loop through each subplot and display random images
for i in range(cols):
for j in range(rows):
# Generate a random index to select a random image and label
random_index = np.random.randint(0, len(labels))
# Add a subplot to the grid
ax = fig.add_subplot(rows, cols, i * rows + j + 1)
# Plot the image
ax.imshow(image_array[random_index])
# Set the title to the corresponding label
ax.set_title(labels[random_index])
# Remove axis ticks for better visualization
ax.axis('off')
# Display the plot
plt.show()
display_random_images(images, labels, rows=4, cols=6)
The images as seen above are in BGR format. The leaves still look green, because even in RGB format, the green channel is in the same position. However, the rocks appear blue which is rather unnatural which is the main clue that the images are not in the expected RGB format expected by the imshow function from Matplotlib.
# Function to create labeled bar plots from a 1D array
def labeled_barplot(data, perc=False, n=None):
"""
Barplot with count or percentage labels at the top.
data: 1D array or list with categorical data.
perc: whether to display percentages instead of counts (default is False).
n: displays the top n categories (default is None, i.e., display all).
"""
# Get counts of unique values
counts = Counter(data)
# Optionally, select the top n categories
if n is not None:
counts = dict(counts.most_common(n))
categories = list(counts.keys())
values = list(counts.values())
total = sum(values) # Total count of all elements
# Plot setup
plt.figure(figsize=(len(categories) + 2, 6))
ax = sns.barplot(x=categories, y=values, palette="Paired")
plt.xticks(rotation=90, fontsize=15)
# Annotate bars with counts or percentages
for p in ax.patches:
if perc:
label = "{:.1f}%".format(100 * p.get_height() / total) # Percentage
else:
label = int(p.get_height()) # Count
x = p.get_x() + p.get_width() / 2
y = p.get_height()
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points"
)
plt.show()
labeled_barplot(labels, perc=False, n=None)
There is some mild imbalance in the number of samples for each plant species.
# Initialize an empty array to store the RGB images
rgb_images = np.zeros_like(images)
# Loop through and convert each image from BGR to RGB
for i in range(images.shape[0]):
rgb_images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB)
display_random_images(rgb_images, labels, rows=4, cols=6)
After converting the images from BGR to RGB, the rocks in the image now look much more natural.
As the size of the images is large, it may be computationally expensive to train on these larger images; therefore, it is preferable to reduce the image size from 128 to 64.
resized_images = np.zeros((rgb_images.shape[0], 64, 64, 3), dtype=images.dtype) # Prepare an empty array
for i in range(images.shape[0]):
img = Image.fromarray(images[i].astype('uint8')) # Convert NumPy array to PIL image
img_resized = img.resize((64, 64)) # Resize the image to 64x64
resized_images[i] = np.array(img_resized) # Convert back to NumPy array
# Loop through and convert each image from BGR to RGB
for i in range(resized_images.shape[0]):
resized_images[i] = cv2.cvtColor(resized_images[i], cv2.COLOR_BGR2RGB)
print(f"New shape of images: {resized_images.shape}")
New shape of images: (4750, 64, 64, 3)
display_random_images(resized_images, labels, rows=4, cols=6)
Split the dataset
X = resized_images
y = labels
# Step 1: Split into 80% train and 20% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Resulting splits:
# X_train, y_train -> 80% of the data (training set)
# X_test, y_test -> 20% of the data (test set)
# Print the shapes of the resulting datasets
print("Training set shape:", X_train.shape, y_train.shape)
print("Test set shape:", X_test.shape, y_test.shape)
Training set shape: (3800, 64, 64, 3) (3800,) Test set shape: (950, 64, 64, 3) (950,)
# Creating one-hot encoded representation of target labels
# we can do this by using this utility function - https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical
# 1. Use LabelEncoder to encode the unique labels
label_encoder = LabelEncoder()
label_encoder.fit(labels) # Fit on the full labels array to capture all unique values
# 2. Encode y_train and y_test using the fitted LabelEncoder
y_train_encoded = label_encoder.transform(y_train)
y_test_encoded = label_encoder.transform(y_test)
# 3. Apply one-hot encoding using tf.keras.utils.to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train_encoded)
y_test_encoded = tf.keras.utils.to_categorical(y_test_encoded)
# Verify the shapes
print("One-hot encoded y_train shape:", y_train_encoded.shape)
print("One-hot encoded y_test shape:", y_test_encoded.shape)
One-hot encoded y_train shape: (3800, 12) One-hot encoded y_test shape: (950, 12)
As we know image pixel values range from 0-255, here we are simply dividing all the pixel values by 255 to standardize all the images to have their pixel values between 0-1.
# Normalizing the image pixels
X_train = X_train/255
X_test = X_test/255
#Verify the values are now in the 0 to 1 range
X_train[0]
array([[[0.40784314, 0.35686275, 0.32156863],
[0.39215686, 0.34901961, 0.30196078],
[0.38039216, 0.34901961, 0.29803922],
...,
[0.28627451, 0.21568627, 0.17254902],
[0.27843137, 0.21176471, 0.15686275],
[0.30980392, 0.24313725, 0.17647059]],
[[0.4 , 0.35686275, 0.30980392],
[0.4 , 0.35294118, 0.30196078],
[0.38431373, 0.35294118, 0.30196078],
...,
[0.30980392, 0.24705882, 0.2 ],
[0.30588235, 0.24705882, 0.19215686],
[0.30588235, 0.24313725, 0.18431373]],
[[0.40784314, 0.36862745, 0.31372549],
[0.40784314, 0.36862745, 0.30980392],
[0.4 , 0.36862745, 0.31764706],
...,
[0.30980392, 0.24705882, 0.20392157],
[0.30588235, 0.24705882, 0.2 ],
[0.31372549, 0.25490196, 0.2 ]],
...,
[[0.42352941, 0.38431373, 0.34509804],
[0.40392157, 0.37647059, 0.32941176],
[0.34117647, 0.30588235, 0.25490196],
...,
[0.40392157, 0.3372549 , 0.27058824],
[0.40392157, 0.3372549 , 0.26666667],
[0.40392157, 0.33333333, 0.2627451 ]],
[[0.41568627, 0.36862745, 0.33333333],
[0.40392157, 0.37254902, 0.32941176],
[0.33333333, 0.29803922, 0.24705882],
...,
[0.38431373, 0.31372549, 0.25490196],
[0.37647059, 0.30980392, 0.24705882],
[0.38431373, 0.31372549, 0.23921569]],
[[0.40784314, 0.36078431, 0.3372549 ],
[0.39607843, 0.36078431, 0.33333333],
[0.34117647, 0.29803922, 0.2627451 ],
...,
[0.37647059, 0.30980392, 0.25490196],
[0.36078431, 0.30588235, 0.24705882],
[0.36470588, 0.30980392, 0.24313725]]])
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
import random
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)
Let's build a CNN Model.
The model has 2 main parts:
The flow of our model would be as shown below:
# Intializing a sequential model
model = Sequential()
# Adding first conv layer with 64 filters and kernel size 3x3 , padding 'same' provides the output size same as the input size
# Input_shape denotes input image dimension of MNIST images
model.add(Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=(64, 64, 3)))
# Adding max pooling to reduce the size of output of first conv layer
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))
# flattening the output of the conv layer after max pooling to make it ready for creating dense connections
model.add(Flatten())
# Adding a fully connected dense layer with 100 neurons
model.add(Dense(100, activation='relu'))
# Adding the output layer with 12 neurons and activation functions as softmax since this is a multi-class classification problem
model.add(Dense(12, activation='softmax'))
# Using SGD Optimizer, NOTE: I tried this SGD initially, but Adam optimizer yielded better results
# opt = SGD(learning_rate=0.01, momentum=0.9)
# Using Adam Optimizer
opt = Adam()
# Compile model
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
# Generating the summary of the model
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 64, 64, 64) │ 1,792 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 32, 32, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv2d_1 (Conv2D) │ (None, 32, 32, 32) │ 18,464 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 16, 16, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv2d_2 (Conv2D) │ (None, 16, 16, 32) │ 9,248 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling2d_2 (MaxPooling2D) │ (None, 8, 8, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ flatten (Flatten) │ (None, 2048) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense (Dense) │ (None, 100) │ 204,900 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 12) │ 1,212 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 235,616 (920.38 KB)
Trainable params: 235,616 (920.38 KB)
Non-trainable params: 0 (0.00 B)
history_1 = model.fit(
X_train, y_train_encoded,
epochs=15,
validation_split=0.1,
shuffle=True,
batch_size=64,
verbose=2
)
Epoch 1/15 54/54 - 5s - 91ms/step - accuracy: 0.1444 - loss: 2.4182 - val_accuracy: 0.2605 - val_loss: 2.2661 Epoch 2/15 54/54 - 1s - 11ms/step - accuracy: 0.3275 - loss: 2.0030 - val_accuracy: 0.4368 - val_loss: 1.6214 Epoch 3/15 54/54 - 1s - 10ms/step - accuracy: 0.4427 - loss: 1.5981 - val_accuracy: 0.5158 - val_loss: 1.3068 Epoch 4/15 54/54 - 1s - 10ms/step - accuracy: 0.5129 - loss: 1.3850 - val_accuracy: 0.5342 - val_loss: 1.2501 Epoch 5/15 54/54 - 1s - 12ms/step - accuracy: 0.5865 - loss: 1.2094 - val_accuracy: 0.5947 - val_loss: 1.0731 Epoch 6/15 54/54 - 1s - 10ms/step - accuracy: 0.6365 - loss: 1.0474 - val_accuracy: 0.6342 - val_loss: 0.9915 Epoch 7/15 54/54 - 1s - 12ms/step - accuracy: 0.6754 - loss: 0.9359 - val_accuracy: 0.6658 - val_loss: 0.8916 Epoch 8/15 54/54 - 1s - 10ms/step - accuracy: 0.6968 - loss: 0.8667 - val_accuracy: 0.7105 - val_loss: 0.8351 Epoch 9/15 54/54 - 1s - 12ms/step - accuracy: 0.7354 - loss: 0.7709 - val_accuracy: 0.7158 - val_loss: 0.8119 Epoch 10/15 54/54 - 1s - 12ms/step - accuracy: 0.7535 - loss: 0.7124 - val_accuracy: 0.7237 - val_loss: 0.7920 Epoch 11/15 54/54 - 1s - 11ms/step - accuracy: 0.7725 - loss: 0.6672 - val_accuracy: 0.7368 - val_loss: 0.7389 Epoch 12/15 54/54 - 1s - 12ms/step - accuracy: 0.7830 - loss: 0.6287 - val_accuracy: 0.7211 - val_loss: 0.7939 Epoch 13/15 54/54 - 1s - 21ms/step - accuracy: 0.7962 - loss: 0.5688 - val_accuracy: 0.7421 - val_loss: 0.7892 Epoch 14/15 54/54 - 1s - 12ms/step - accuracy: 0.8099 - loss: 0.5221 - val_accuracy: 0.7447 - val_loss: 0.7944 Epoch 15/15 54/54 - 1s - 12ms/step - accuracy: 0.8307 - loss: 0.4701 - val_accuracy: 0.7211 - val_loss: 0.8585
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
train_accuracy_1 = round(history_1.history['accuracy'][-1], 3)
val_accuracy_1 = round(history_1.history['val_accuracy'][-1], 3)
# Compute loss and accuracy on test data
test_loss_1, test_accuracy_1 = model.evaluate(X_test, y_test_encoded, verbose=2)
test_loss_1 = round(test_loss_1, 3)
test_accuracy_1 = round(test_accuracy_1, 3)
print(f"Test Loss: {test_loss_1}")
print(f"Test Accuracy: {test_accuracy_1}")
30/30 - 1s - 32ms/step - accuracy: 0.7053 - loss: 0.9008 Test Loss: 0.901 Test Accuracy: 0.705
#Confusion Matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test_encoded, axis=1) # Decode one-hot encoded y_test_encoded
# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)
class_names = label_encoder.classes_ # Get class names from the label encoder
# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step
#Save a copy if this will be the final model
final_model = model
X_test_final = X_test
y_test_final = y_test
y_test_encoded_final = y_test_encoded
label_encoder_final = label_encoder
Reducing the Learning Rate:
Hint: Use ReduceLRonPlateau() function that will be used to decrease the learning rate by some factor, if the loss is not decreasing for some time. This may start decreasing the loss at a smaller learning rate. There is a possibility that the loss may still not decrease. This may lead to executing the learning rate reduction again in an attempt to achieve a lower loss.
Here were will use data augmentaqtion to balance the occurrence of images in the training data for each label. Data augmentation is not used in the validation/test data set.
# Since we only want to perform augmentation on the training data we'll split out train, val, and test separately
X = resized_images
y = labels
# Initial split: 80% train, 20% test
X_train_val, X_test, y_train_val, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Second split: from the 80% train_val set, split off 10% for validation
X_train, X_val, y_train, y_val = train_test_split(
X_train_val, y_train_val, test_size=0.125, random_state=42, stratify=y_train_val
)
# Resulting splits:
# X_train, y_train -> 70% of the data (training set)
# X_val, y_val -> 10% of the data (validation set)
# X_test, y_test -> 20% of the data (test set)
# Print the shapes of the resulting datasets
print("Training set shape:", X_train.shape, y_train.shape)
print("Validation set shape:", X_val.shape, y_val.shape)
print("Test set shape:", X_test.shape, y_test.shape)
Training set shape: (3325, 64, 64, 3) (3325,) Validation set shape: (475, 64, 64, 3) (475,) Test set shape: (950, 64, 64, 3) (950,)
#Here we can see the unbalanced categories in the training data
labeled_barplot(y_train, perc=False, n=None)
# For all the labels determine the value of the most frequent one
unique, counts = np.unique(y_train, return_counts=True)
# Find the maximum count value across all classes
target_count = counts.max()
print("Maximum count per class (target count for augmentation):", target_count)
Maximum count per class (target count for augmentation): 458
# Augmentation Parameters
train_datagen = ImageDataGenerator(
rotation_range=20, # Random rotations
width_shift_range=0.1, # Horizontal shifts
height_shift_range=0.1, # Vertical shifts
shear_range=0.1, # Shear transformations
zoom_range=0.1, # Random zooms
horizontal_flip=True, # Random horizontal flips
vertical_flip=False # Random vertical flips
)
Let's do a test of the augmentation process on a few sample images and see what the augementation looks like. The display_images function has been enhanced to either plot all the images (show_random=False) or plot samples (show_random=True)
def display_images(image_array, labels, rows=4, cols=6, show_random=True):
"""
Display a grid of images with their corresponding labels.
Parameters:
- image_array: The array containing images (shape: (num_images, height, width, channels)).
- labels: Array of labels corresponding to the images.
- rows: Number of rows in the plot grid (default is 4).
- cols: Number of columns in the plot grid (default is 6).
- show_random: If True, displays a random subset of images; if False, displays all images.
"""
# Determine the number of images to show
num_images = rows * cols if show_random else len(image_array)
# Adjust rows and columns to fit all images if show_random is False
if not show_random:
rows = (num_images // cols) + (num_images % cols > 0)
# Dynamically set the figure size based on rows and cols
fig_width = cols * 2.5 # Adjust factor as needed for image size
fig_height = rows * 2 # Adjust factor as needed for image size
fig = plt.figure(figsize=(fig_width, fig_height))
# Adjust subplot spacing to reduce white space
# plt.subplots_adjust(hspace=0.4, wspace=0.5)
# Loop through each subplot and display images
for i in range(num_images):
# Select a random index if show_random is True, otherwise go sequentially
index = np.random.randint(0, len(labels)) if show_random else i
# Add a subplot to the grid
ax = fig.add_subplot(rows, cols, i + 1)
# Plot the image
ax.imshow(image_array[index])
# Set the title to the corresponding label
ax.set_title(labels[index])
# Remove axis ticks for better visualization
ax.axis('off')
# Display the plot
plt.show()
# Generate a small batch of augmented images
X_subset = X_train[:10] # Taking a small subset of original images for visualization
y_subset = y_train[:10] # Corresponding labels
# Create an iterator with a batch size equal to the subset size
augmented_iterator = train_datagen.flow(X_subset, y_subset, batch_size=len(X_subset), shuffle=False)
# Generate one batch of augmented images
X_augmented, y_augmented = next(augmented_iterator)
# Convert X_augmented to uint8
X_augmented = X_augmented.astype(np.uint8)
# Display original images
print("Original Images:")
display_images(X_subset, y_subset, rows=2, cols=5, show_random=False)
# Display augmented images
print("Augmented Images:")
display_images(X_augmented, y_augmented, rows=2, cols=5, show_random=False)
Original Images:
Augmented Images:
Now we need to work on augmenting the images so that we have an equal number plant species in the training data set.
# Function to augment train images for a specific class to achieve a specific target_count
def augment_class(X_class, y_class, target_count):
augmented_images = []
augmented_labels = []
current_count = X_class.shape[0]
# Continue augmenting until reaching the target count
while current_count < target_count:
for X_batch, y_batch in train_datagen.flow(X_class, y_class, batch_size=1):
augmented_images.append(X_batch[0]) # Append augmented image
augmented_labels.append(y_batch[0]) # Append label
current_count += 1
if current_count >= target_count:
break
return np.array(augmented_images), np.array(augmented_labels)
# Loop through each class in X_train and y_train
X_train_balanced = []
y_train_balanced = []
for class_label in np.unique(y_train):
# Extract images and labels for the current class
X_class = X_train[y_train == class_label]
y_class = y_train[y_train == class_label]
# Add existing images
X_train_balanced.extend(X_class)
y_train_balanced.extend(y_class)
# If the class count is less than the target, augment images
if len(X_class) < target_count:
X_aug, y_aug = augment_class(X_class, y_class, target_count)
X_train_balanced.extend(X_aug)
y_train_balanced.extend(y_aug)
# Convert lists back to arrays
X_train_balanced = np.array(X_train_balanced)
y_train_balanced = np.array(y_train_balanced)
# Convert X_train_balanced to uint8
X_train_balanced = X_train_balanced.astype(np.uint8)
print("Balanced training set shape:", X_train_balanced.shape, y_train_balanced.shape)
Balanced training set shape: (5496, 64, 64, 3) (5496,)
#Confirm that the classes are balanced now
labeled_barplot(y_train_balanced, perc=False, n=None)
# 1. Use LabelEncoder to encode the unique labels
label_encoder = LabelEncoder()
label_encoder.fit(labels) # Fit on the full labels array to capture all unique values
# 2. Encode y_train, y_val, and y_test using the fitted LabelEncoder
y_train_encoded = label_encoder.transform(y_train_balanced)
y_val_encoded = label_encoder.transform(y_val)
y_test_encoded = label_encoder.transform(y_test)
# 3. Apply one-hot encoding using tf.keras.utils.to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train_encoded)
y_val_encoded = tf.keras.utils.to_categorical(y_val_encoded)
y_test_encoded = tf.keras.utils.to_categorical(y_test_encoded)
# Verify the shapes
print("One-hot encoded y_train shape:", y_train_encoded.shape)
print("One-hot encoded y_val shape:", y_val_encoded.shape)
print("One-hot encoded y_test shape:", y_test_encoded.shape)
One-hot encoded y_train shape: (5496, 12) One-hot encoded y_val shape: (475, 12) One-hot encoded y_test shape: (950, 12)
# Normalizing the image pixels for train, val and test
X_train_balanced = X_train_balanced/255
X_val = X_val/255
X_test = X_test/255
#Verify the values are now in the 0 to 1 range
X_train_balanced[0]
array([[[0.38823529, 0.30980392, 0.24313725],
[0.38039216, 0.30588235, 0.23137255],
[0.39607843, 0.33333333, 0.25490196],
...,
[0.35686275, 0.25098039, 0.14509804],
[0.35294118, 0.24313725, 0.12941176],
[0.36078431, 0.25882353, 0.14509804]],
[[0.38823529, 0.30980392, 0.24313725],
[0.38431373, 0.30588235, 0.23137255],
[0.39215686, 0.3254902 , 0.24313725],
...,
[0.3254902 , 0.21568627, 0.10980392],
[0.3372549 , 0.22745098, 0.12156863],
[0.37254902, 0.2745098 , 0.16862745]],
[[0.39215686, 0.32156863, 0.25098039],
[0.39215686, 0.31764706, 0.24313725],
[0.38823529, 0.31764706, 0.23529412],
...,
[0.34901961, 0.24705882, 0.14509804],
[0.36078431, 0.25882353, 0.16078431],
[0.38039216, 0.29803922, 0.19607843]],
...,
[[0.31372549, 0.18823529, 0.14509804],
[0.37254902, 0.23529412, 0.18039216],
[0.41568627, 0.2745098 , 0.21176471],
...,
[0.23137255, 0.16862745, 0.14901961],
[0.22352941, 0.17647059, 0.15294118],
[0.22352941, 0.18823529, 0.15294118]],
[[0.38039216, 0.24313725, 0.18039216],
[0.43137255, 0.29411765, 0.21960784],
[0.47843137, 0.34509804, 0.26666667],
...,
[0.22352941, 0.16862745, 0.14117647],
[0.22352941, 0.18823529, 0.15686275],
[0.23137255, 0.20784314, 0.17254902]],
[[0.43529412, 0.29411765, 0.21176471],
[0.48235294, 0.35294118, 0.26666667],
[0.52156863, 0.40392157, 0.31372549],
...,
[0.22745098, 0.18431373, 0.14901961],
[0.22352941, 0.19215686, 0.15686275],
[0.22745098, 0.2 , 0.16078431]]])
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
import random
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)
# Define the EarlyStopping callback, NOTE: EarlyStopping was tried, but did not yield better results
early_stopping = EarlyStopping(
monitor='val_loss', # Metric to monitor
patience=5, # Number of epochs with no improvement after which training will be stopped
restore_best_weights=True # Restore model weights from the epoch with the best value of the monitored metric
)
# Define the ReduceLROnPlateau callback
reduce_lr = ReduceLROnPlateau(
monitor='val_loss', # Metric to monitor
factor=0.5, # Factor by which the learning rate will be reduced
patience=3, # Number of epochs with no improvement after which learning rate will be reduced
min_lr=1e-6, # Lower bound on the learning rate
verbose=1 # Verbose output
)
# Use the same model as before
# Intializing a sequential model
model = Sequential()
# Adding first conv layer with 64 filters and kernel size 3x3 , padding 'same' provides the output size same as the input size
# Input_shape denotes input image dimension of MNIST images
model.add(Conv2D(64, (3, 3), activation='relu', padding="same", input_shape=(64, 64, 3)))
# Adding max pooling to reduce the size of output of first conv layer
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding="same"))
model.add(MaxPooling2D((2, 2), padding = 'same'))
# flattening the output of the conv layer after max pooling to make it ready for creating dense connections
model.add(Flatten())
# Adding a fully connected dense layer with 100 neurons
model.add(Dense(100, activation='relu'))
# Adding the output layer with 12 neurons and activation functions as softmax since this is a multi-class classification problem
model.add(Dense(12, activation='softmax'))
# Using Adam Optimizer
opt = Adam()
# Compile model
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
# Generating the summary of the model
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 64, 64, 64) │ 1,792 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 32, 32, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv2d_1 (Conv2D) │ (None, 32, 32, 32) │ 18,464 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 16, 16, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ conv2d_2 (Conv2D) │ (None, 16, 16, 32) │ 9,248 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ max_pooling2d_2 (MaxPooling2D) │ (None, 8, 8, 32) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ flatten (Flatten) │ (None, 2048) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense (Dense) │ (None, 100) │ 204,900 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 12) │ 1,212 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 235,616 (920.38 KB)
Trainable params: 235,616 (920.38 KB)
Non-trainable params: 0 (0.00 B)
history_2 = model.fit(
X_train_balanced, y_train_encoded,
epochs=20,
validation_data=(X_val, y_val_encoded), # Use the predefined validation data
shuffle=True,
batch_size=64,
#callbacks=[early_stopping], # Pass the early stopping callback
callbacks=[reduce_lr], # Pass the ReduceLROnPlateau callback
verbose=2
)
Epoch 1/20 86/86 - 6s - 75ms/step - accuracy: 0.1521 - loss: 2.3182 - val_accuracy: 0.2632 - val_loss: 1.9404 - learning_rate: 0.0010 Epoch 2/20 86/86 - 1s - 12ms/step - accuracy: 0.3641 - loss: 1.6932 - val_accuracy: 0.4442 - val_loss: 1.4778 - learning_rate: 0.0010 Epoch 3/20 86/86 - 1s - 15ms/step - accuracy: 0.4778 - loss: 1.4012 - val_accuracy: 0.5137 - val_loss: 1.2708 - learning_rate: 0.0010 Epoch 4/20 86/86 - 1s - 13ms/step - accuracy: 0.5580 - loss: 1.1969 - val_accuracy: 0.5853 - val_loss: 1.0824 - learning_rate: 0.0010 Epoch 5/20 86/86 - 1s - 9ms/step - accuracy: 0.6210 - loss: 1.0198 - val_accuracy: 0.6126 - val_loss: 1.0113 - learning_rate: 0.0010 Epoch 6/20 86/86 - 1s - 16ms/step - accuracy: 0.6718 - loss: 0.9018 - val_accuracy: 0.6358 - val_loss: 0.9816 - learning_rate: 0.0010 Epoch 7/20 86/86 - 1s - 13ms/step - accuracy: 0.7123 - loss: 0.8047 - val_accuracy: 0.6611 - val_loss: 0.9297 - learning_rate: 0.0010 Epoch 8/20 86/86 - 1s - 9ms/step - accuracy: 0.7362 - loss: 0.7374 - val_accuracy: 0.6632 - val_loss: 0.9213 - learning_rate: 0.0010 Epoch 9/20 86/86 - 1s - 16ms/step - accuracy: 0.7411 - loss: 0.7087 - val_accuracy: 0.6884 - val_loss: 0.9028 - learning_rate: 0.0010 Epoch 10/20 86/86 - 1s - 15ms/step - accuracy: 0.7566 - loss: 0.6822 - val_accuracy: 0.6547 - val_loss: 0.9558 - learning_rate: 0.0010 Epoch 11/20 86/86 - 1s - 15ms/step - accuracy: 0.7868 - loss: 0.5882 - val_accuracy: 0.7011 - val_loss: 0.8964 - learning_rate: 0.0010 Epoch 12/20 86/86 - 1s - 12ms/step - accuracy: 0.8090 - loss: 0.5270 - val_accuracy: 0.7305 - val_loss: 0.8562 - learning_rate: 0.0010 Epoch 13/20 86/86 - 1s - 10ms/step - accuracy: 0.8197 - loss: 0.4941 - val_accuracy: 0.6779 - val_loss: 0.8867 - learning_rate: 0.0010 Epoch 14/20 86/86 - 1s - 10ms/step - accuracy: 0.8241 - loss: 0.4807 - val_accuracy: 0.7158 - val_loss: 0.8240 - learning_rate: 0.0010 Epoch 15/20 86/86 - 1s - 10ms/step - accuracy: 0.8466 - loss: 0.4309 - val_accuracy: 0.7368 - val_loss: 0.7957 - learning_rate: 0.0010 Epoch 16/20 86/86 - 1s - 9ms/step - accuracy: 0.8632 - loss: 0.3984 - val_accuracy: 0.7305 - val_loss: 0.8533 - learning_rate: 0.0010 Epoch 17/20 86/86 - 1s - 15ms/step - accuracy: 0.8779 - loss: 0.3486 - val_accuracy: 0.7011 - val_loss: 0.9345 - learning_rate: 0.0010 Epoch 18/20 Epoch 18: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 86/86 - 1s - 14ms/step - accuracy: 0.8937 - loss: 0.3038 - val_accuracy: 0.6968 - val_loss: 0.9667 - learning_rate: 0.0010 Epoch 19/20 86/86 - 1s - 9ms/step - accuracy: 0.9061 - loss: 0.2725 - val_accuracy: 0.7284 - val_loss: 0.9151 - learning_rate: 5.0000e-04 Epoch 20/20 86/86 - 1s - 15ms/step - accuracy: 0.9245 - loss: 0.2286 - val_accuracy: 0.7284 - val_loss: 0.9374 - learning_rate: 5.0000e-04
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
train_accuracy_2 = round(history_2.history['accuracy'][-1], 3)
val_accuracy_2 = round(history_2.history['val_accuracy'][-1], 3)
# Compute loss and accuracy on test data
test_loss_2, test_accuracy_2 = model.evaluate(X_test, y_test_encoded, verbose=2)
test_loss_2 = round(test_loss_2, 3)
test_accuracy_2 = round(test_accuracy_2, 3)
print(f"Test Loss: {test_loss_2}")
print(f"Test Accuracy: {test_accuracy_2}")
30/30 - 1s - 22ms/step - accuracy: 0.7116 - loss: 1.0265 Test Loss: 1.026 Test Accuracy: 0.712
#Confusion Matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test_encoded, axis=1) # Decode one-hot encoded y_test_encoded
# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)
class_names = label_encoder.classes_ # Get class names from the label encoder
# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()
30/30 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step
As a final exercise, we'll try a transfer learning approach using the InceptionV3 model. InceptionV3 was chosen, because it is tolerant of using smaller image sizes that we have here of 128x128. VGGNet was not used, because it typically expects higher resolution images, namely 224x224, and is not tolerant for smaller image sizes according to descriptions I reviewed. ResNet could have been another option to try.
# Since we only want to perform augmentation on the training data we'll split out train, val, and test separately
# Note, this time we are using the original 128x128x3 size images and not the reduced 64x64x3 ones.
X = rgb_images
y = labels
# Initial split: 80% train, 20% test
X_train_val, X_test, y_train_val, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Second split: from the 80% train_val set, split off 10% for validation
X_train, X_val, y_train, y_val = train_test_split(
X_train_val, y_train_val, test_size=0.125, random_state=42, stratify=y_train_val
)
# Resulting splits:
# X_train, y_train -> 70% of the data (training set)
# X_val, y_val -> 10% of the data (validation set)
# X_test, y_test -> 20% of the data (test set)
# Print the shapes of the resulting datasets
print("Training set shape:", X_train.shape, y_train.shape)
print("Validation set shape:", X_val.shape, y_val.shape)
print("Test set shape:", X_test.shape, y_test.shape)
Training set shape: (3325, 128, 128, 3) (3325,) Validation set shape: (475, 128, 128, 3) (475,) Test set shape: (950, 128, 128, 3) (950,)
# For all the labels determine the value of the most frequent one
unique, counts = np.unique(y_train, return_counts=True)
# Find the maximum count value across all classes
target_count = counts.max()
print("Maximum count per class (target count for augmentation):", target_count)
Maximum count per class (target count for augmentation): 458
# Augmentation Parameters
train_datagen = ImageDataGenerator(
rotation_range=20, # Random rotations
width_shift_range=0.1, # Horizontal shifts
height_shift_range=0.1, # Vertical shifts
shear_range=0.1, # Shear transformations
zoom_range=0.1, # Random zooms
horizontal_flip=True, # Random horizontal flips
vertical_flip=False # Random vertical flips
)
# Loop through each class in X_train and y_train
X_train_balanced = []
y_train_balanced = []
for class_label in np.unique(y_train):
# Extract images and labels for the current class
X_class = X_train[y_train == class_label]
y_class = y_train[y_train == class_label]
# Add existing images
X_train_balanced.extend(X_class)
y_train_balanced.extend(y_class)
# If the class count is less than the target, augment images
if len(X_class) < target_count:
X_aug, y_aug = augment_class(X_class, y_class, target_count)
X_train_balanced.extend(X_aug)
y_train_balanced.extend(y_aug)
# Convert lists back to arrays
X_train_balanced = np.array(X_train_balanced)
y_train_balanced = np.array(y_train_balanced)
# Convert X_train_balanced to uint8
X_train_balanced = X_train_balanced.astype(np.uint8)
print("Balanced training set shape:", X_train_balanced.shape, y_train_balanced.shape)
Balanced training set shape: (5496, 128, 128, 3) (5496,)
# 1. Use LabelEncoder to encode the unique labels
label_encoder = LabelEncoder()
label_encoder.fit(labels) # Fit on the full labels array to capture all unique values
# 2. Encode y_train, y_val, and y_test using the fitted LabelEncoder
y_train_encoded = label_encoder.transform(y_train_balanced)
y_val_encoded = label_encoder.transform(y_val)
y_test_encoded = label_encoder.transform(y_test)
# 3. Apply one-hot encoding using tf.keras.utils.to_categorical
y_train_encoded = tf.keras.utils.to_categorical(y_train_encoded)
y_val_encoded = tf.keras.utils.to_categorical(y_val_encoded)
y_test_encoded = tf.keras.utils.to_categorical(y_test_encoded)
# Verify the shapes
print("One-hot encoded y_train shape:", y_train_encoded.shape)
print("One-hot encoded y_val shape:", y_val_encoded.shape)
print("One-hot encoded y_test shape:", y_test_encoded.shape)
One-hot encoded y_train shape: (5496, 12) One-hot encoded y_val shape: (475, 12) One-hot encoded y_test shape: (950, 12)
# Normalizing the image pixels for val and test
X_train_balanced = X_train_balanced/255
X_val = X_val/255
X_test = X_test/255
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
# Fixing the seed for random number generators
import random
np.random.seed(42)
random.seed(42)
tf.random.set_seed(42)
inception_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(128, 128, 3))
# Freeze layers
for layer in inception_model.layers:
layer.trainable = False
#inception_model.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 87910968/87910968 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
# Add custom layers on top of InceptionV3 with Dropout layers
x = inception_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.2)(x) # Adding a dropout layer with a 20% dropout rate
x = Dense(64, activation='relu')(x)
x = Dropout(0.2)(x) # Adding another dropout layer with a 20% dropout rate
output = Dense(12, activation='softmax')(x)
# Create the new model
model = Model(inputs=inception_model.input, outputs=output)
# As this is a very complex model, let's not show the summary as usual. Instead below
# We'll extract out some key information
#model.summary()
# Get the total number of layers
total_layers = len(model.layers)
print(f"Total number of layers: {total_layers}")
# Calculate total, trainable, and non-trainable parameters
trainable_params = int(np.sum([np.prod(v.shape) for v in model.trainable_weights]))
non_trainable_params = int(np.sum([np.prod(v.shape) for v in model.non_trainable_weights]))
total_params = trainable_params + non_trainable_params
print(f"Total parameters: {total_params}")
print(f"Trainable parameters: {trainable_params}")
print(f"Non-trainable parameters: {non_trainable_params}")
Total number of layers: 317 Total parameters: 22074092 Trainable parameters: 271308 Non-trainable parameters: 21802784
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history_3 = model.fit(
X_train_balanced, y_train_encoded,
epochs=20,
validation_data=(X_val, y_val_encoded), # Use the predefined validation data
callbacks=[reduce_lr], # Pass the ReduceLROnPlateau callback
verbose=2
)
Epoch 1/20 172/172 - 54s - 315ms/step - accuracy: 0.2615 - loss: 2.1925 - val_accuracy: 0.4189 - val_loss: 1.6930 - learning_rate: 0.0010 Epoch 2/20 172/172 - 40s - 232ms/step - accuracy: 0.4323 - loss: 1.6453 - val_accuracy: 0.4989 - val_loss: 1.4218 - learning_rate: 0.0010 Epoch 3/20 172/172 - 5s - 28ms/step - accuracy: 0.5111 - loss: 1.3926 - val_accuracy: 0.5200 - val_loss: 1.3192 - learning_rate: 0.0010 Epoch 4/20 172/172 - 5s - 30ms/step - accuracy: 0.5620 - loss: 1.2505 - val_accuracy: 0.5726 - val_loss: 1.2391 - learning_rate: 0.0010 Epoch 5/20 172/172 - 5s - 28ms/step - accuracy: 0.6064 - loss: 1.1252 - val_accuracy: 0.5853 - val_loss: 1.2743 - learning_rate: 0.0010 Epoch 6/20 172/172 - 5s - 30ms/step - accuracy: 0.6235 - loss: 1.0674 - val_accuracy: 0.5705 - val_loss: 1.2635 - learning_rate: 0.0010 Epoch 7/20 172/172 - 5s - 27ms/step - accuracy: 0.6596 - loss: 0.9604 - val_accuracy: 0.6126 - val_loss: 1.2103 - learning_rate: 0.0010 Epoch 8/20 172/172 - 5s - 27ms/step - accuracy: 0.6814 - loss: 0.9096 - val_accuracy: 0.5895 - val_loss: 1.2312 - learning_rate: 0.0010 Epoch 9/20 172/172 - 5s - 27ms/step - accuracy: 0.7054 - loss: 0.8304 - val_accuracy: 0.6021 - val_loss: 1.2061 - learning_rate: 0.0010 Epoch 10/20 172/172 - 4s - 26ms/step - accuracy: 0.7180 - loss: 0.7909 - val_accuracy: 0.6189 - val_loss: 1.2040 - learning_rate: 0.0010 Epoch 11/20 172/172 - 5s - 29ms/step - accuracy: 0.7371 - loss: 0.7474 - val_accuracy: 0.6042 - val_loss: 1.3093 - learning_rate: 0.0010 Epoch 12/20 172/172 - 5s - 26ms/step - accuracy: 0.7455 - loss: 0.7295 - val_accuracy: 0.6105 - val_loss: 1.2678 - learning_rate: 0.0010 Epoch 13/20 Epoch 13: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 172/172 - 5s - 27ms/step - accuracy: 0.7647 - loss: 0.6624 - val_accuracy: 0.5874 - val_loss: 1.3211 - learning_rate: 0.0010 Epoch 14/20 172/172 - 5s - 28ms/step - accuracy: 0.7913 - loss: 0.5789 - val_accuracy: 0.6147 - val_loss: 1.2973 - learning_rate: 5.0000e-04 Epoch 15/20 172/172 - 5s - 29ms/step - accuracy: 0.8011 - loss: 0.5551 - val_accuracy: 0.6232 - val_loss: 1.3077 - learning_rate: 5.0000e-04 Epoch 16/20 Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 172/172 - 5s - 27ms/step - accuracy: 0.8090 - loss: 0.5174 - val_accuracy: 0.6295 - val_loss: 1.3188 - learning_rate: 5.0000e-04 Epoch 17/20 172/172 - 4s - 26ms/step - accuracy: 0.8397 - loss: 0.4675 - val_accuracy: 0.6316 - val_loss: 1.2922 - learning_rate: 2.5000e-04 Epoch 18/20 172/172 - 5s - 29ms/step - accuracy: 0.8372 - loss: 0.4531 - val_accuracy: 0.6337 - val_loss: 1.2570 - learning_rate: 2.5000e-04 Epoch 19/20 Epoch 19: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 172/172 - 6s - 33ms/step - accuracy: 0.8413 - loss: 0.4490 - val_accuracy: 0.6253 - val_loss: 1.3118 - learning_rate: 2.5000e-04 Epoch 20/20 172/172 - 5s - 27ms/step - accuracy: 0.8493 - loss: 0.4253 - val_accuracy: 0.6421 - val_loss: 1.3065 - learning_rate: 1.2500e-04
plt.plot(history_3.history['accuracy'])
plt.plot(history_3.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
train_accuracy_3 = round(history_3.history['accuracy'][-1], 3)
val_accuracy_3 = round(history_3.history['val_accuracy'][-1], 3)
# Compute loss and accuracy on test data
test_loss_3, test_accuracy_3 = model.evaluate(X_test, y_test_encoded, verbose=2)
test_loss_3 = round(test_loss_3, 3)
test_accuracy_3 = round(test_accuracy_3, 3)
print(f"Test Loss: {test_loss_3}")
print(f"Test Accuracy: {test_accuracy_3}")
30/30 - 7s - 226ms/step - accuracy: 0.6758 - loss: 1.2048 Test Loss: 1.205 Test Accuracy: 0.676
#Confusion Matrix
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test_encoded, axis=1) # Decode one-hot encoded y_test_encoded
# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)
class_names = label_encoder.classes_ # Get class names from the label encoder
# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix")
plt.show()
30/30 ━━━━━━━━━━━━━━━━━━━━ 12s 199ms/step
The Base CNN model has the best accuracy and least variation between train and test accuracy (suggesting the model suffers the least from overfitting). The Base CNN model is the clearly the best of the three and is selected as the Final Model.
train_param_1 = "235,616"
train_param_2 = "235,616"
train_param_3 = "271,308"
# Create the Markdown table as a formatted string
table_md = f"""
| Model / Acc | Train | Val | Test | Trainable Params |
|-------------|-------|-----|------|------------------|
| Base CNN | {train_accuracy_1} | {val_accuracy_1} | {test_accuracy_1} | {train_param_1} |
| Base CNN with Aug | {train_accuracy_2} | {val_accuracy_2} | {test_accuracy_2} | {train_param_2} |
| InceptionV3 with Aug | {train_accuracy_3} | {val_accuracy_3} | {test_accuracy_3} | {train_param_3} |
"""
# Display the table
from IPython.display import Markdown, display
display(Markdown(table_md))
| Model / Acc | Train | Val | Test | Trainable Params |
|---|---|---|---|---|
| Base CNN | 0.837 | 0.755 | 0.729 | 235,616 |
| Base CNN with Aug | 0.924 | 0.728 | 0.712 | 235,616 |
| InceptionV3 with Aug | 0.849 | 0.642 | 0.676 | 271,308 |
Now let's take a few sample images from the test data and run them through the final model to check its predictions.
# Select a few test samples
num_samples = 5 # Number of test images to check
sample_indices = np.random.choice(len(X_test_final), num_samples, replace=False)
sample_images = X_test_final[sample_indices]
sample_labels = y_test_final[sample_indices] # Actual labels
sample_labels_encoded = y_test_encoded_final[sample_indices] # Encoded labels for prediction
# Run the samples through the model to get predictions
predictions = final_model.predict(sample_images)
predicted_labels = np.argmax(predictions, axis=1) # Decode one-hot predictions
print(sample_labels)
print(predicted_labels)
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 63ms/step ['Scentless Mayweed' 'Common wheat' 'Maize' 'Loose Silky-bent' 'Cleavers'] [ 8 4 11 6 2]
# Create label mapping using label_encoder_final
label_mapping = {index: label for index, label in enumerate(label_encoder_final.classes_)}
for i in range(num_samples):
# Convert one-hot encoded labels to class indices
true_label_index = np.argmax(sample_labels_encoded[i])
predicted_label_index = predicted_labels[i] # Directly use the predicted label index
# Map indices to original string labels
true_label = label_mapping[true_label_index]
predicted_label = label_mapping[predicted_label_index]
# Display the image with true and predicted labels
plt.imshow(sample_images[i])
plt.title(f"True: {true_label}, Predicted: {predicted_label}")
plt.axis('off')
plt.show()
In the above, Maize was incorrectly predicted as Sugar Beet, which per the Confusion Matrix was one of the mistakes it sometimes makes in the case of Maize.